{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Think Bayes: Chapter 11\n",
    "\n",
    "This notebook presents code and exercises from Think Bayes, second edition.\n",
    "\n",
    "Copyright 2016 Allen B. Downey\n",
    "\n",
    "MIT License: https://opensource.org/licenses/MIT"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from __future__ import print_function, division\n",
    "\n",
    "% matplotlib inline\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "import math\n",
    "import numpy as np\n",
    "\n",
    "from thinkbayes2 import Pmf, Cdf, Suite, Joint\n",
    "import thinkplot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Euro problem\n",
    "\n",
    "Problem statement here.\n",
    "\n",
    "Here's a more efficient version of the Euro class that takes the dataset in a more compact form and uses the binomial distribution (ignoring the binomial coefficient because it does not depend on `x`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "class Euro(Suite):\n",
    "    \"\"\"Represents hypotheses about the probability of heads.\"\"\"\n",
    "\n",
    "    def Likelihood(self, data, hypo):\n",
    "        \"\"\"Computes the likelihood of the data under the hypothesis.\n",
    "\n",
    "        hypo: integer value of x, the probability of heads (0-100)\n",
    "        data: tuple of (number of heads, number of tails)\n",
    "        \"\"\"\n",
    "        x = hypo / 100.0\n",
    "        heads, tails = data\n",
    "        like = x**heads * (1-x)**tails\n",
    "        return like"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we know the coin is fair, we can evaluate the likelihood of the data directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "p(D|F) 5.52714787526e-76\n"
     ]
    }
   ],
   "source": [
    "data = 140, 110\n",
    "\n",
    "suite = Euro()\n",
    "like_f = suite.Likelihood(data, 50)\n",
    "print('p(D|F)', like_f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we cheat an pretend that the alternative hypothesis is exactly the observed proportion, we can compute the likelihood of the data and the likelihood ratio, relative to the fair coin. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "p(D|B_cheat) 3.35828998524e-75\n",
      "p(D|B_cheat) / p(D|F) 6.07599083837\n"
     ]
    }
   ],
   "source": [
    "actual_percent = 100.0 * 140 / 250\n",
    "likelihood = suite.Likelihood(data, actual_percent)\n",
    "print('p(D|B_cheat)', likelihood)\n",
    "print('p(D|B_cheat) / p(D|F)', likelihood / like_f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Under this interpretation, the data are in favor of \"biased\", with K=6.  But that's a total cheat.\n",
    "\n",
    "Suppose we think \"biased\" means either 0.4 or 0.6, but we're not sure which.  The total likelihood of the data is the weighted average of the two likelihoods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "p(D|B_two) 7.35777272954e-76\n",
      "p(D|B_two) / p(D|F) 1.33120605701\n"
     ]
    }
   ],
   "source": [
    "like40 = suite.Likelihood(data, 40)\n",
    "like60 = suite.Likelihood(data, 60)\n",
    "likelihood = 0.5 * like40 + 0.5 * like60\n",
    "print('p(D|B_two)', likelihood)\n",
    "print('p(D|B_two) / p(D|F)', likelihood / like_f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Under this interpretation, the data are in favor of \"biased\", but very weak.\n",
    "\n",
    "More generally, if \"biased\" refers to a range of possibilities with different probabilities, the total likelihood of the data is the weighted sum:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def SuiteLikelihood(suite, data):\n",
    "    \"\"\"Computes the weighted average of likelihoods for sub-hypotheses.\n",
    "\n",
    "    suite: Suite that maps sub-hypotheses to probability\n",
    "    data: some representation of the data\n",
    "   \n",
    "    returns: float likelihood\n",
    "    \"\"\"\n",
    "    total = 0\n",
    "    for hypo, prob in suite.Items():\n",
    "        like = suite.Likelihood(data, hypo)\n",
    "        total += prob * like\n",
    "    return total"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's what it looks like if \"biased\" means \"equally likely to be any value between 0 and 1\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "p(D|B_uniform) 2.57964902292e-76\n",
      "p(D|B_uniform) / p(D|F) 0.466723359161\n"
     ]
    }
   ],
   "source": [
    "b_uniform = Euro(range(0, 101))\n",
    "b_uniform.Remove(50)\n",
    "b_uniform.Normalize()\n",
    "likelihood = SuiteLikelihood(b_uniform, data)\n",
    "print('p(D|B_uniform)', likelihood)\n",
    "print('p(D|B_uniform) / p(D|F)', likelihood / like_f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By that definition, the data are evidence against the biased hypothesis, with K=2.\n",
    "\n",
    "But maybe a triangle prior is a better model of what \"biased\" means."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def TrianglePrior():\n",
    "    \"\"\"Makes a Suite with a triangular prior.\"\"\"\n",
    "    suite = Euro()\n",
    "    for x in range(0, 51):\n",
    "        suite.Set(x, x)\n",
    "    for x in range(51, 101):\n",
    "        suite.Set(x, 100-x) \n",
    "    suite.Normalize()\n",
    "    return suite"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's what it looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "p(D|B_tri) 4.61720463665e-76\n",
      "p(D|B_tri) / p(D|F) 0.835368392677\n"
     ]
    }
   ],
   "source": [
    "b_tri = TrianglePrior()\n",
    "b_tri.Remove(50)\n",
    "b_tri.Normalize()\n",
    "likelihood = b_tri.Update(data)\n",
    "print('p(D|B_tri)', likelihood)\n",
    "print('p(D|B_tri) / p(D|F)', likelihood / like_f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By the triangle definition of \"biased\", the data are very weakly in favor of \"fair\".\n",
    "\n",
    "## Normalizing constant\n",
    "\n",
    "We don't really need the SuiteLikelihood function, because `Suite.Update` already computes the total probability of the data, which is the normalizing constant."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2.5796490229198124e-76"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "likelihood = SuiteLikelihood(b_uniform, data)\n",
    "likelihood"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2.579649022919812e-76"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "euro = Euro(b_uniform)\n",
    "euro.Update(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2.4249639967038797e-75"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "likelihood = SuiteLikelihood(b_tri, data)\n",
    "likelihood"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2.4249639967038797e-75"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "euro = Euro(b_tri)\n",
    "euro.Update(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This observation is the basis of hierarchical Bayesian models, of which this solution to the Euro problem is a simple example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
